12 research outputs found
Auto-tuning TensorFlow Threading Model for CPU Backend
TensorFlow is a popular deep learning framework used by data scientists to
solve a wide-range of machine learning and deep learning problems such as image
classification and speech recognition. It also operates at a large scale and in
heterogeneous environments --- it allows users to train neural network models
or deploy them for inference using GPUs, CPUs and deep learning specific
custom-designed hardware such as TPUs. Even though TensorFlow supports a
variety of optimized backends, realizing the best performance using a backend
may require additional efforts. For instance, getting the best performance from
a CPU backend requires careful tuning of its threading model. Unfortunately,
the best tuning approach used today is manual, tedious, time-consuming, and,
more importantly, may not guarantee the best performance.
In this paper, we develop an automatic approach, called TensorTuner, to
search for optimal parameter settings of TensorFlow's threading model for CPU
backends. We evaluate TensorTuner on both Eigen and Intel's MKL CPU backends
using a set of neural networks from TensorFlow's benchmarking suite. Our
evaluation results demonstrate that the parameter settings found by TensorTuner
produce 2% to 123% performance improvement for the Eigen CPU backend and 1.5%
to 28% performance improvement for the MKL CPU backend over the performance
obtained using their best-known parameter settings. This highlights the fact
that the default parameter settings in Eigen CPU backend are not the ideal
settings; and even for a carefully hand-tuned MKL backend, the settings may be
sub-optimal. Our evaluations also revealed that TensorTuner is efficient at
finding the optimal settings --- it is able to converge to the optimal settings
quickly by pruning more than 90% of the parameter search space.Comment: Paper presented at Machine Learning in HPC Environments workshop held
along with SuperComputing 2018, Dallas, Texa
ControlFlag: A Self-supervised Idiosyncratic Pattern Detection System for Software Control Structures
Software debugging has been shown to utilize upwards of 50% of developers’ time. Machine programming, the field concerned with the automation of software (and hardware) development, has recently made progress in both research and production-quality automated debugging systems. In this paper, we present ControlFlag, a system that detects possible idiosyncratic violations in software control structures. ControlFlag also suggests possible corrections in the event a true error is detected. A novelty of ControlFlag is that it is entirely self-supervised; that is, it requires no labels to learn about the potential idiosyncratic programming pattern violations. In addition to presenting ControlFlag’s design, we also provide an abbreviated experimental evaluation
Quantifying OpenMP: Statistical Insights into Usage and Adoption
In high-performance computing (HPC), the demand for efficient parallel
programming models has grown dramatically since the end of Dennard Scaling and
the subsequent move to multi-core CPUs. OpenMP stands out as a popular choice
due to its simplicity and portability, offering a directive-driven approach for
shared-memory parallel programming. Despite its wide adoption, however, there
is a lack of comprehensive data on the actual usage of OpenMP constructs,
hindering unbiased insights into its popularity and evolution. This paper
presents a statistical analysis of OpenMP usage and adoption trends based on a
novel and extensive database, HPCORPUS, compiled from GitHub repositories
containing C, C++, and Fortran code. The results reveal that OpenMP is the
dominant parallel programming model, accounting for 45% of all analyzed
parallel APIs. Furthermore, it has demonstrated steady and continuous growth in
popularity over the past decade. Analyzing specific OpenMP constructs, the
study provides in-depth insights into their usage patterns and preferences
across the three languages. Notably, we found that while OpenMP has a strong
"common core" of constructs in common usage (while the rest of the API is less
used), there are new adoption trends as well, such as simd and target
directives for accelerated computing and task for irregular parallelism.
Overall, this study sheds light on OpenMP's significance in HPC applications
and provides valuable data for researchers and practitioners. It showcases
OpenMP's versatility, evolving adoption, and relevance in contemporary parallel
programming, underlining its continued role in HPC applications and beyond.
These statistical insights are essential for making informed decisions about
parallelization strategies and provide a foundation for further advancements in
parallel programming models and techniques
CompCodeVet: A Compiler-guided Validation and Enhancement Approach for Code Dataset
Large language models (LLMs) have become increasingly prominent in academia
and industry due to their remarkable performance in diverse applications. As
these models evolve with increasing parameters, they excel in tasks like
sentiment analysis and machine translation. However, even models with billions
of parameters face challenges in tasks demanding multi-step reasoning. Code
generation and comprehension, especially in C and C++, emerge as significant
challenges. While LLMs trained on code datasets demonstrate competence in many
tasks, they struggle with rectifying non-compilable C and C++ code. Our
investigation attributes this subpar performance to two primary factors: the
quality of the training dataset and the inherent complexity of the problem
which demands intricate reasoning. Existing "Chain of Thought" (CoT) prompting
techniques aim to enhance multi-step reasoning. This approach, however, retains
the limitations associated with the latent drawbacks of LLMs. In this work, we
propose CompCodeVet, a compiler-guided CoT approach to produce compilable code
from non-compilable ones. Diverging from the conventional approach of utilizing
larger LLMs, we employ compilers as a teacher to establish a more robust
zero-shot thought process. The evaluation of CompCodeVet on two open-source
code datasets shows that CompCodeVet has the ability to improve the training
dataset quality for LLMs
Scope is all you need: Transforming LLMs for HPC Code
With easier access to powerful compute resources, there is a growing trend in
the field of AI for software development to develop larger and larger language
models (LLMs) to address a variety of programming tasks. Even LLMs applied to
tasks from the high-performance computing (HPC) domain are huge in size (e.g.,
billions of parameters) and demand expensive compute resources for training. We
found this design choice confusing - why do we need large LLMs trained on
natural languages and programming languages unrelated to HPC for HPC-specific
tasks? In this line of work, we aim to question design choices made by existing
LLMs by developing smaller LLMs for specific domains - we call them
domain-specific LLMs. Specifically, we start off with HPC as a domain and
propose a novel tokenizer named Tokompiler, designed specifically for
preprocessing code in HPC and compilation-centric tasks. Tokompiler leverages
knowledge of language primitives to generate language-oriented tokens,
providing a context-aware understanding of code structure while avoiding human
semantics attributed to code structures completely. We applied Tokompiler to
pre-train two state-of-the-art models, SPT-Code and Polycoder, for a Fortran
code corpus mined from GitHub. We evaluate the performance of these models
against the conventional LLMs. Results demonstrate that Tokompiler
significantly enhances code completion accuracy and semantic understanding
compared to traditional tokenizers in normalized-perplexity tests, down to ~1
perplexity score. This research opens avenues for further advancements in
domain-specific LLMs, catering to the unique demands of HPC and compilation
tasks
MISIM: A Novel Code Similarity System
Code similarity systems are integral to a range of applications from code recommendation to automated software defect correction. We argue that code similarity is now a first-order problem that must be solved. To begin to address this, we present machine Inferred Code Similarity (MISIM), a novel end-to-end code similarity system that consists of two core components. First, MISIM uses a novel context-aware semantic structure, which is designed to aid in lifting semantic meaning from code syntax. Second, MISIM provides a neural-based code similarity scoring algorithm, which can be implemented with various neural network architectures with learned parameters. We compare MISIM to three state-of-the-art code similarity systems: (i) code2vec, (ii) Neural Code Comprehension, and (iii) Aroma. In our experimental evaluation across 328,155 programs (over 18 million lines of code), MISIM has 1.5x to 43.4x better accuracy than all three systems
Devil is Virtual: Reversing Virtual Inheritance in C++ Binaries
Complexities that arise from implementation of object-oriented concepts in
C++ such as virtual dispatch and dynamic type casting have attracted the
attention of attackers and defenders alike.
Binary-level defenses are dependent on full and precise recovery of class
inheritance tree of a given program.
While current solutions focus on recovering single and multiple inheritances
from the binary, they are oblivious to virtual inheritance. Conventional wisdom
among binary-level defenses is that virtual inheritance is uncommon and/or
support for single and multiple inheritances provides implicit support for
virtual inheritance. In this paper, we show neither to be true.
Specifically, (1) we present an efficient technique to detect virtual
inheritance in C++ binaries and show through a study that virtual inheritance
can be found in non-negligible number (more than 10\% on Linux and 12.5\% on
Windows) of real-world C++ programs including Mysql and libstdc++. (2) we show
that failure to handle virtual inheritance introduces both false positives and
false negatives in the hierarchy tree. These false positves and negatives
either introduce attack surface when the hierarchy recovered is used to enforce
CFI policies, or make the hierarchy difficult to understand when it is needed
for program understanding (e.g., during decompilation). (3) We present a
solution to recover virtual inheritance from COTS binaries. We recover a
maximum of 95\% and 95.5\% (GCC -O0) and a minimum of 77.5\% and 73.8\% (Clang
-O2) of virtual and intermediate bases respectively in the virtual inheritance
tree.Comment: Accepted at CCS20. This is a technical report versio